Breakdown of Arabidopsis thaliana thioredoxins and glutaredoxins based on electrostatic similarity–Leads to common and unique interaction partners and functions

The reversible reduction and oxidation of protein thiols was first described as mechanism to control light/dark-dependent metabolic regulation in photosynthetic organisms. Today, it is recognized as an essential mechanism of regulation and signal transduction in all kingdoms of life. Proteins of the thioredoxin (Trx) family, Trxs and glutaredoxins (Grxs) in particular, catalyze thiol-disulfide exchange reactions and are vital players in the operation of thiol switches. Various Trx and Grx isoforms are present in all compartments of the cell. These proteins have a rather broad but at the same time distinct substrate specificity. Understanding the molecular basis of their target specificity is central to the understanding of physiological and pathological redox signaling. Electrostatic complementarity of the redoxins with their target proteins has been proposed as a major reason. Here, we analyzed the electrostatic similarity of all Arabidopsis thaliana Trxs, Grxs, and proteins containing such domains. Clustering of the redoxins based on this comparison suggests overlapping and also distant target specificities and thus functions of the different sub-classes including all Trx isoforms as well as the three classes of Grxs, i.e. CxxC-, CGFS-, and CC-type Grxs. Our analysis also provides a rationale for the tuned substrate specificities of both the ferredoxin- and NADPH-dependent Trx reductases.


Introduction
Redox modifications of cysteinyl side chains are a vital part of numerous signal transduction pathways, in photosynthetic organisms these mechanisms play a vital role, e.g. in light-dark adaptation [1][2][3].Redox modifications of protein thiols such as disulfide formation and reduction are catalyzed by members of the Trx family of proteins, i.e.Trxs and Grxs [3][4][5][6][7].This group of proteins share a common structural motif, the Trx fold.Cysteinyl residues in their active sites are the basis of their redox activity [8].The proteins of this family catalyze the oxidation and reduction of disulfides in target proteins, including glutathionylation-deglutathionylation of proteins.Trx family proteins are encoded in essentially all genomes and they were likely already present in the last universal common ancestor of all life forms an earth [9].Trxs and Grxs are present in all compartments of eukaryotic cells, e.g. the cytosol, ER, mitochondria, nucleus, and plastids-often in multiple isoforms [10].Most Trxs and Grxs have a broad, but distinct substrate specificity.Understanding the molecular basis of their target specificity is key to their physiological functions and for the understanding of redox regulation in general.This molecular basis is the focus of this work.
Trxs are efficient catalysts of thiol-disulfide exchange reactions and the trans-nitrosylation of cysteinyl side chains [11,12].During their reaction cycle, Trxs form a disulfide in their Cys-Gly-Pro-Cys active site.This disulfide is reduced by thioredoxin reductases (TrxRs).Photosynthetic organisms contain two types of TrxRs.First, the ferredoxin-dependent FTRs that couple the photosynthetic electron chain directly to redox regulation [13] and, second, the NADPHdependent NTRs that function, among others, in stress defense [14].Grxs are divided into three classes [4,15,16].The first, also named class-I or CxxC-type Grxs share a Cys-Pro-Tyr-Cys consensus active site motif and catalyze thiol-disulfide exchange reactions, often including glutathione (GSH).Oxidized Grxs are reduced by two molecules of GSH.The resulting GSH disulfide is reduced by GSH reductases at the expense of NADPH.The second subclass of the Grxs, named class-II or CGFS-type Grxs, with a consensus Cys-Gly-Phe-Ser active site, do not catalyze thiol-disulfide exchange reactions.Instead, they function in the regulation of iron metabolism or in the transfer of iron-sulfur centers [17][18][19].Two alternative loop structures preceding the active sites and different modes of GSH-binding resulting thereof are the basis for the fundamentally different functions of these two Grx classes [20,21].The proteins of the third class are restricted to land plants, they were named ROXYs or CC-type Grxs.These proteins function in the regulation of TGA family transcription factors.So far, neither redox, nor FeS transfer activity of these proteins was described that could be linked to their physiological functions.In most species, and in plants in particular, various Trx family protein isoforms were described and characterized in the same compartment, prompting questions on both overlapping and distinct functions.Proteomic studies screening for interaction partners and target proteins, also summarized in this work, imply a high degree of substrate specificity for the individual isoforms.
The basis for this specificity has long been a mystery.Previous hypotheses addressed the thermodynamics of the reaction, foremost the nucleophilicity of the more N-terminal active site thiol and the resulting differences in their standard redox potentials [22,23].Also favorable entropic contributions as the major recognition force controlling the specificity of these protein-protein interactions were suggested [24].Based on the analysis of Escherichia coli phosphoadenylyl sulfate reductase, we proposed that different electrostatic properties of the redoxins govern their target specificity and reactivity [25,26].A comprehensive analyses of Trx family proteins from human and various other species confirmed that primary and tertiary structure similarity do not correlate to the target specificity of the proteins.Instead, a number of examples clearly demonstrated the importance of electrostatic similarity for their target specificity [27].In this work, we have compared all redoxins encoded in the genome of the model plant Arabidopsis thaliana.Our results emphasize the importance of the electrostatic properties and suggest some unexpected close functional ties between all classes of redoxins.

Structures and molecular modeling
Available structures were obtained form the protein data bank (https://www.rcsb.org),all pdb entries are specified accordingly.The Swiss Model web server was used to perform molecular modeling [28][29][30][31].Based on the quality assessment provided, i.e. the lowest QMEAN with no major outlier in the global or local quality estimate of Cβ, all atom, solvation, or torsion, the final model was chosen from structures modeled with different templates that displayed the highest sequence identity with the target protein.S1

Sequence and structure comparison
Protein sequences were obtained from the uniprot resources.Alignments of the primary structures and generation of the corresponding distance trees were performed using the CLC sequence viewer (Qiagen bioinformatics, Hilden, Germany) and Clustal omega [32].PDBe-Fold server (v2.59, https://www.ebi.ac.uk/msd-srv/ssm/cgi-bin/ssmserver) was used for alignment of the three dimensional structures, the structure-based multiple sequence alignments were then prepared with ClustalW2-"Simple Phylogeny"-tool (https://www.ebi.ac.uk/Tools/ phylogeny/) with default settings.Corresponding distance trees, based on these primary structure and 3D structure alignments, were generated with the iTOL (https://itol.embl.de/)[33] viewer.

Electrostatic calculations
Alignment of the structures in the PDB files in the desired orientation was performed using UCSF chimera.The proteins electrostatic properties were computed from the pdbs as follows: using pdb2pqr with the amber force field, missing atoms were reconstructed, hydrogens added, atomic charges and radii assigned [34].Calculation of the electrostatic parameters was performed using the Adaptive Poisson-Boltzmann Solver (APBS) [35] within the vmd (visual molecular dynamics) software package [36].The parameters were set to the following values: 150 mM mobile ions, a temperature of 298.15 K, and a solvent dielectric constant of 78.54.From these data, images were rendered representing the secondary structures including the N-terminal active site cysteinyl residues facing forwards, the electrostatic potential mapped to the proteins water accessible surface from -4 (red) to 4 k�T�e -1 (blue), and the isosurfaces of the electrostatic potential at -1 (red) and 1 k�T�e -1 (blue).Applying ImageMagick a summary picture was generated.All steps following the 3D alignment were automatized, the scripts and a graphical interface are available at: https:// github.com/WillyBruhn/MutComp.

Electrostatic distances and clustering
Using the Gromov-Wasserstein-distance approach described before (27), we compared the 3D isofsurfaces of both the negative and positive electrostatic potential.Solving this problem, is NP-hard as the objective function is not convex, however, in polynomial time three lower bounds for the Gromov-Wasserstein-distance can be calculated.We limited the sample to 100 points randomly distributed on the isosurfaces and calculate the lower bound for them.Following an 500 times repetition, the values obtained were summarized in form of a histogram.This comparison was performed pairwise for all proteins.The earth-mover's-distance was used [37] to calculate the similarity between the histograms, the unweighted pair group method with arithmetic mean (UPGMA) was used for hierarchical clustering, see [27].This method yields the mean distance between all points from the new cluster to all points of another cluster and the results were displayed in form of a dendrogram.The software source code is available at: https://github.com/BerensF/ComparingProteins.

Interactome data and comparison
Interactome data were retrieved from the IntAct database [38] (as of Sept. 2018) as well as the BioGRID resources [39] (Vers.3.4, as of Sept. 2018).The uniprot resources (https://www.uniprot.org) was used to perform ID mapping and all entries are listed with their unique Uni-protKB ID.Additional resources for interactions, e.g. from some dedicated publications, were included, when available.S1 Table in S1 File (interactome) contain all identified entries.Computation of the matrix of common interactions and the Venn diagrams was performed using R (https://www.r-project.org)from RStudio (https://www.rstudio.com).

Results and discussion
In plants, Trx family proteins were originally named and classified according to their specificity for certain redox-regulated substrates.For Trxs, for instance: F-type (fructose-1,6-bisphophate phosphatase), H-type, and M-type (malate dehydrogenase).Subsequently, redoxins with similar primary structures were named accordingly with added consecutive numbers in order of their discovery.The complexity of the plant redox systems, summarized in Table 1, is astonishing.Given that a substantial number of them function in redox regulation and all of them will function through direct protein-protein interactions, the molecular determinants of their specificity for distinct and common target proteins are central to the understanding of their regulatory functions.We have proposed that complementary electrostatic patterns control this specificity [25][26][27].The aim of this study is to provide a thorough comparison between clustering and classification according to primary structure, 3-D structure, and electrostatic similarity.Further more, we discuss the implication of the clustering of electrostatic properties for the functional classification of plant Trx family proteins and potential overlaps in target proteins.The electrostatic characteristics and similarities were computed as outlined before [27].Table 1 also lists functions of the proteins as described in the literature.Potentially interacting proteins were collected from primary literature as well as interaction databases, i.e.IntAct and BioGRID (see methods section and S1 File).For ten proteins (GrxC5, GrxS14, GrxS16, TrxF2, TrxH1, TrxM1, TrxM2, TrxO1, TrxO2, and Trl33), 3-D structures were obtained from the protein data base (pdb).The remaining structures were predicted by homology modeling.

Grxs and Trxs in Arabidopsis thaliana
The A. thaliana genome encodes 31 Grxs or Grx domain-containing proteins and 37 Trxs or Trx domain-containing proteins (see Table 1).Unfortunately, only for ten of these proteins experimentally determined structures were available.Homology modeling of the others, however, yielded structure models with reliable quality parameters (see S1 Table in S1 File).The wealth of several 100 template structures for Trx family proteins allows the prediction of valid model structures in most cases.
Next, we compared the proteins based on their primary and tertiary structures as well as their electrostatic similarity.Both the clustering according to primary and 3D structures (see  groups (for all details, see S1 Fig in S1 File).Five of these groups (I, II, V, VI, and VII) show a split charge distribution, in this view negative charge on the left and positive on the right, with various unique features.Two of the groups (II and IV) display a predominantly positive potential on this surface with a more neutral area surrounding the N-terminal active site thiol.For comparison, we added an analysis of the only experimentally solved complex between a plant redoxin and a target protein, i.e. barley TrxH2 with barley alpha-amylase/subtilisin inhibitor protein (BASI) [73] (Fig 2B).This complex highlights the electrostatic complementary of the interaction surfaces, as described for other Trx-complexes before [25,74].It also illustrate that the electrostatic differences between the seven groups (Fig 2A ) must have a profound impact on their interaction profile.The data on potential interaction partners of plant redoxins published or deposited in the databases (see S1 Table) was limited.For the following redoxins a substantial number of potential interaction partners could be identified: GrxC1 30 unique interactions, TrxH1 45, TrxH3 70, TrxH5 75, and TrxH7 97.From the total of 317 entries, 280 were only reported as interaction partner for one of these redoxins.The highest overlap (see the Venn diagram in Fig 3) was detected between TrxH5 and TrxH7 (n = 24), the second highest between GrxC1 and TrxH7 (n = 7), and the third highest between TrxH3 and TrxH5 (n = 5).Of course, the data basis used for this comparison (literature and databases) was neither representative, nor obtained in a systematic or comparable way.All of the proteins mentioned before are located in the third major group (II-VII) of our tree (Fig 1), in favor of some common interaction partners.The CC-type/ROXY Grxs physically and genetically interact with different members of the TGA transcription factor family, however the nature of these interactions (redox modifications or sole binding) and their interactions with potential other target proteins were mentioned as "one of the challenges in the future" in a previous review article [16].In both primary and tertiary structure clustering, this Grx class-III subfamily form a homogeneous group (Fig 1A and  1B).With respect to the electrostatic clustering, these proteins are spread over all six of the seven clusters with a concentration around clusters III and IV (green labels in Fig 1C).This may indicate that theses CC-type/ROXY Grxs interact with specific sets of different target proteins, with some overlapping functions, for instance of GrxC11, GrxS4, GrxS7, and GrxS8 (see cluster 'V' in Fig 1C).The closely related GrxC7 and C8 (ROXY1 and 2) both interact with the transcription factors PAN, TGA9, and TGA10 (summarized in ref. [16]), see S1 Table ).This is also consistent with their predicted similarity in electrostatic features (see cluster 'IV' in Fig 1C and S1 Fig in S1 File).The high similarity in the predicted electrostatic characteristics between the different CC-type/ROXY Grxs with classical Cys-X-X-Cys-type Trxs and Grxs may indicate that proteins from all major groups share some interaction partners; a hypothesis that remains to be proven.

Electrostatic similarity of redoxins from photosynthetic species compared to redoxins from other species
Previously, we have reported the comparison of the electrostatic characteristics of all experimentally solved redoxin structures [27].Based on these data, we have now focused on a comparison of the redoxins from photosynthetic organisms with the redoxins from various other (B) Similarity tree based on the similarity of the 3D structures extracted from the pdb and generated by homology modeling; the tree was computed using PDBeFold server (https://www.ebi.ac.uk/msd-srv/ssm/cgi-bin/ssmserver) and the iTOL (https://itol.embl.de/)(C) The electrostatic similarity of the whole proteins was computed as outlined in the methods section; the tree was generated using 'R', the tree representation was done with iTOL (https://itol.embl.de/).The protein abbreviations highlighted in green correspond to the 'ROXY' classes in A and B. https://doi.org/10.1371/journal.pone.0291272.g001species.We excluded mutated proteins, extracted the redoxin domains if required, excluded other molecules in the pdb files, and focused on one representative structure when multiple entries were available in the pdb structure.We have also excluded all proteins reported to function in oxidative protein folding as well as peroxidases such as peroxiredoxins and GSH peroxidases.The electrostatic similarity tree based on this data set separates into eleven branches, marked as I-XI in Fig 4 .Most of these branches include structures from both the Trx and Grx subfamilies.These branches do not correspond to branches of trees generated by comparing primary or tertiary structure similarity [27].The redoxins from the photosynthetic organisms, as for all other branches of life, do not form separate groups.We speculate that electrostatic similarity corresponds to target specificity, as reported for the model target E. coli PAPS reductase [25].This, in turn, implies (electrostatically) similar targets for some redoxins across species barriers.The cytosolic H-type Trxs (1xfl, 1ep7, 2vm1, 2vlt, 1wmj, 1ti3) for instance, are spread over branches VII-XI, the two plastidal M-type Trxs (1dby, 1fb0) are located in branch X. Noteworthy, this branch contains also a number of confirmed electron donors for ribonucleotide reductase from E. coli (E. coli Grx1, 1egr; E. coli Trx1, 1xoa; and human Trx1, 1ert) as well as two Trxs from the H-type (Hordeum vulgare TrxH1 and H2, 2vm1 and 2vlt, respectively).Plant genomes encode a particularly high number of Trx-family proteins.In general, the number of redoxins, together with the genome size in general, increased during the evolution of more complex organisms.However, the human genome encodes 'only' 35, the E. coli genome seven Trx family proteins.The high number of Trx-family proteins in plants must reflect the importance of redox regulation and cellular redox homeostasis in plant biology.Four functions stand out in particular.First, chloroplast and photosynthesis functions, redoxins are essential for maintaining the redox state of photosynthetic enzymes and regulating the light reactions [75].Second, the regulatory functions of the CC-/ROXY-/class-III-type Grxs that are unique to plants [76], 21 of the 66 Arabidopsis redoxins belong to this group (Table 1).Third, the wide verity of environmental challenges that plants are exposed to, such as temperature variations or droughts.And fourth, specific functions in distinct developmental stages, for instance during seed germination [77].As outlined above, many plant redoxins show high similarity in their electrostatic properties to their counterparts from other domains of life (Fig 5).The redoxins from the electrostatic similarity groups III and IV (see Figs 1 and 2), however, do not resemble Trx-family proteins from other organisms.These two closely related groups include 81% of the CC-/ROXY-/class-III-type Grxs, and the two F-type Trxs, all of which are unique to plants.Taken together, our results suggest that plants evolved their high variety of Trx-family proteins to fine tune specific functions, similar to all higher life forms, and to cope with unique new targets and regulatory needs.

Thioredoxin reductases
Unlike other organisms, plants contain two types of Trx reductases.The NTRs include the cytosolic and mitochondrial NTR1 and 2 (also named NTRA and NTRB) [14] and the plastidal NTRC that, unlike all other TrxRs, contains its own C-terminal Trx domain [78].To better understand their reactivity with the various Trx isoforms, we also analyzed the electrostatic properties of A. thaliana FTR, NTR1, NTR2, and NTRC (excluding its Trx domain, see Fig 5).The structure of A. thaliana FTR from FTR-TrxF2 complex published in [79] (pdb 7c2b) was used for comparison.NTRs must undergo substantial conformational changes during their reaction cycle.In all experimentally determined structures, the two cysteinyl residues that reduce the Trx substrates in the dithiol-disulfide exchange reactions face the FAD domain for electron transfer form NADPH.When reducing Trxs, this domain must open up to interact with the Trx.We therefore focused on the surrounding of the reactive cysteinyl residues.The structure of NTR2 [80,81] served as template for the modeling of NTR1 and NTR3.Our results show some major differences between the two reductase classes.First, the FTRs show a particular large interaction surface with the Trxs of about 756 Å 2 .This is considerably larger than for any other TrxR including the bacterial and human enzymes with around 560 and 450 Å 2 , respectively [82,83].The protruding geometry of the active site thiols of the NTRs (Fig 5 ) suggest a smaller interactions surface for these as well.Second, the electrostatic properties of both classes differ significantly.While the NTRs, including NTRC, exhibit the division of potentials in larger negative and positive potentials with the active site right in between them, the FTR displays a very negative potential facing the Trx interaction surface (Fig 5).When comparing for complementarity, this indicates that NTRs should interact preferably with Trxs that also contain their active sites at the border between a larger negative and positive potential.This is true for most Trxs [25,27].For the Arabidopsis Trxs (see Fig 2A) this is the case for the Trxs of the clusters I, II, V, VI, and VII.While FTR may preferentially interact with Trxs of clusters III and IV, such as TrxF1 and F2 (Fig 2A).This suggests, in agreement with a previous report [84], higher efficiencies of FTR with the F-type Trxs.Under competitive conditions, these should be the preferred substrates over the M-type TrxM1 (cluster VII), as well as TrxM3 and TrxM4 (both cluster V) implying a hierarchical organization of light-dependent redox regulation in the chloroplast.Noteworthy, TrxM2 is most similar to TrxF1 (both cluster IV).

Conclusions
Here, we have applied a mathematical model developed for the automated clustering of proteins, in our case members of the Trx family of proteins, according to their electrostatic similarity.The analysis of the plant redoxin structures clearly demonstrates that primary and tertiary structure similarity do not correlate to electrostatic similarity and presumably function, see [25].The least understood group, the CC-/ROXY-/class-III-type Grxs, for instance, contains various members with high electrostatic similarity to Trxs F1 and F2 (Fig 1C).The clustering according to electrostatic properties represents a model to understand both the overlapping and distinct target specificity of the plant redoxins in their different cellular compartments, they also provide clear clues for the necessity for multiple TrxRs within a given compartment.
Fig 1A and 1B) yielded very similar results.The proteins fall into the previously established classes, i.e. for the Grxs the CxxC-/class-I-type, the CGFS-/class-II-type, and the CC-/ ROXY-/ class-III-type, for the Trxs the Trx-f, Trx-h, Trx-m, Trx-o, Trx-x/y, Trx-like (Trl), and nucleoredoxin (Nrx) groups.The comparison of the proteins' electrostatic features (Fig 1C, see also Fig 2 and S1 Fig in S1 File), yielded a completely different picture, not in accordance with the current classification systems.The proteins fall into three major groups with in total seven subgroups (Fig 1C, I-VII).All of these groups contain proteins from the Trxs and Grxs families and different sub-classes within these families.Fig 2A displays the electrostatic properties of the protein-protein interaction surfaces of representative members of each of these seven

Fig 1 .
Fig 1. Clustering of Arabidopsis thaliana redoxins.(A) Phylogram based on primary structure comparison, computed by Clustal Omega and CLC sequence viewer.(B)Similarity tree based on the similarity of the 3D structures extracted from the pdb and generated by homology modeling; the tree was computed using PDBeFold server (https://www.ebi.ac.uk/msd-srv/ssm/cgi-bin/ssmserver) and the iTOL (https://itol.embl.de/)(C) The electrostatic similarity of the whole proteins was computed as outlined in the methods section; the tree was generated using 'R', the tree representation was done with iTOL (https://itol.embl.de/).The protein abbreviations highlighted in green correspond to the 'ROXY' classes in A and B.

Fig 2 .
Fig 2. Electrostatic features of the active site contact areas of exemplary redoxins from the seven electrostatic groups of Arabidopsis thaliana redoxins.(A) Representative redoxins for each group.Depicted are the cartoon representations of the redoxins with helices in purple, sheets in yellow and the N-terminal CxxC cysteinyl residues in ball-and-stick model.and the electrostatic potential isosurfaces at + and-1 k�T�e -1 , blue: positive, red negative potential.(B) Interaction between barley (Hordeum vulgare, H.v.) TrxH2 and barley alpha-amylase/subtilisin inhibitor protein (BASI), pdb entry: 2iwt.The protein complex was opened (middle) to expose the interaction surfaces highlighted in cyan.The electrostatic potential isosurfaces of the interaction surfaces are shown at + and-1 k�T�e -1 .The orientation of TrxH2 in (B) is the same as one for the other Trxs in (A).https://doi.org/10.1371/journal.pone.0291272.g002

Fig 3 .
Fig 3. Venn diagram of the potential overlapping interaction partners of the indicated redoxins.The potential or suggested interaction partners are listed in the S1 Table in S1 File and were received from the literature and various interaction data bases.https://doi.org/10.1371/journal.pone.0291272.g003

Fig 4 .
Fig 4. Hierarchical clustering of all redoxins with experimentally determined structures in the pdb.The electrostatic similarity of the whole proteins was computed as outlined in the methods section.All redoxins from photosynthetic organisms were highlighted.The color codes were included in the figure.Modified from reference [27].https://doi.org/10.1371/journal.pone.0291272.g004

Fig 5 .
Fig 5. Electrostatic features of the active site contact areas of the thioredoxin reductases from Arabidopsis thaliana.(A-C) the contact surface of FTR with TrxF2 (pdb 7c2b) computed with UCSF Chimera.(D-F) the electrostatic features of FTR from 7c2b.The image of the electrostatic isosurface (E) was scaled down by a factor of 5 for better visibility, lines represent the scale.(G-I) electrostatic features of NTR2 domain containing the active site cysteinyl residues analyzed based on pdb entry 1vdc (the FAD domain was removed as explained in the maintext).
Table in S1 File summarizes the individual template structures used as well as the QMEAN values.